The following data is on New Orleans tornado building damage during December 2022. This data was obtained from Verisk Analytics and it was derived computer vision and machine learning using post-catastrophe aerial imagry data. There are approximately 42,000 buildings in this dataset.
Here are some interactive before and after aerial images that were taken
This is an example of a building that has a catastrophe score of 100 (FEMA 6 / Destroyed)
This is another example of a building that has a catastrophe score of 100 (FEMA 6 / Destroyed)
This is an example of a building that has a catastrophe score of approximately 60 (FEMA 4 / Major)
I converted roof_solar into a T/F statement, by converting “SOLAR PANEL” to TRUE and “NO SOLAR PANEL” to FALSE. In addition to this, I converted the roof shapes that the computer wasn’t very sure about (up to a 20% chance of being incorrect) into NA. There were some cells in damage_level where they were filled with an empty character, so I converted that into NA as well. I then separated longitude and latitude so that it could be easily read into leaflet.
df <- read.csv("clean_data.csv") %>%
janitor::clean_names() %>%
mutate(roofsolar = case_when(roofsolar == "SOLAR PANEL" ~ TRUE)) %>%
mutate(roofshape = ifelse(roofshascr < 0.80, NA, roofshape)) %>%
select(-c(roofshascr, roofcondit_discolordetect, roofcondit_discolorscore, roofcondit_discolorpercen, trampscr, roofcondit_tarppercen))
df$rooftopgeo <- gsub("POINT \\(|\\)", "", df$rooftopgeo)
df <- df %>%
separate(rooftopgeo, into = c("long", "lat"), sep = " ", convert = TRUE)
df$damage_level <- ifelse(df$damage_level == "", NA, df$damage_level)
df$roofshape <- factor(df$roofshape, levels = c("gable", "hip", "flat"))
levels_roofmateri <- c("metal", "shingle", "membrane", "shake", "tile")
df$roofmateri <- factor(df$roofmateri, levels = c("gravel", levels_roofmateri))
df$roofmateri <- factor(df$roofmateri, levels = levels_roofmateri)
Catastrophe scores are separated by the summary of the dataset, excluding the catastrophe scores of 0.
mostdamage <- df %>% filter(catastrophescore >= 50)
nodamage <- df %>% filter(catastrophescore == 0)
decimated <-df %>% filter(catastrophescore == 100)
middamage <- df %>% filter(catastrophescore < 50 & catastrophescore >= 15)
leastdamage <- df %>% filter(catastrophescore < 15 & catastrophescore >= 2)
minimaldamage <- df %>% filter(catastrophescore == 1)
NOTE: Red indicates the buildings that were the most damaged (catastrophe score >= 50), orange indicates (25 < catastrophe score < 50), blue indicates (catastrophe score <= 25, excluding scores of 0). Only 3852 buildings experienced a nonzero catastrophe score, so the majority of the buildings (37,967) exhibited a catastrophe score of 0.
This shows all of the catastrophe scores, the vast majority of roofs have no damage.
Map of the buildings that experienced the least damage:
Map of the buildings that experienced mid damage:
Map of the buildings that experienced the most damage (interactive!):
Map of the buildings that experienced no damage:
Map of the buildings that were completely destroyed:
Since most of the buildings in this dataset were not damaged by a tornado, the summary of the catastrophe scores of each building is skewed. This can be seen below:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 2.217 0.000 100.000
Due to this, I made models that excluded the catastrophe scores of 0 to just look into the structures that experienced damage. Below is the summary for the structures that exhibited damage:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 15.00 28.64 46.00 100.00
Models
## # Comparison of Model Performance Indices
##
## Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 | RMSE | Sigma
## ---------------------------------------------------------------------------------------------
## mods1 | glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods2 | glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods3 | glm | 26094.1 (>.999) | 26094.1 (>.999) | 26147.3 (>.999) | 0.147 | 27.816 | 27.857
## mods4 | glm | 30925.4 (<.001) | 30925.5 (<.001) | 30968.0 (<.001) | 0.156 | 28.795 | 28.821
## mods5 | glm | 30773.9 (<.001) | 30773.9 (<.001) | 30828.6 (<.001) | 0.176 | 28.402 | 28.437
Out of the models I made, Model 3 appeared to work best. Though it should be noted that none of these models fit particularly well based on the variables used.
Model 3
##
## Call:
## glm(formula = catastrophescore ~ long + roofmateri + rooftree +
## enclosure, family = gaussian(link = "identity"), data = extra)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -75.62 -18.17 -8.69 13.16 82.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4460.13497 1149.62714 3.880 0.000107 ***
## long 49.03688 12.76374 3.842 0.000124 ***
## roofmaterishingle -23.29005 1.43088 -16.277 < 2e-16 ***
## roofmaterimembrane 12.42319 2.18262 5.692 1.37e-08 ***
## roofmaterishake -21.90493 4.73851 -4.623 3.94e-06 ***
## roofmateritile -22.84290 7.71403 -2.961 0.003087 **
## rooftree 0.56774 0.06876 8.257 < 2e-16 ***
## enclosureTRUE 44.80193 10.78228 4.155 3.34e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 808.6766)
##
## Null deviance: 3157376 on 3226 degrees of freedom
## Residual deviance: 2603130 on 3219 degrees of freedom
## (10 observations deleted due to missingness)
## AIC: 30774
##
## Number of Fisher Scoring iterations: 2
## GVIF Df GVIF^(1/(2*Df))
## long 1.010340 1 1.005157
## roofmateri 1.020779 4 1.002574
## rooftree 1.012753 1 1.006356
## enclosure 1.004155 1 1.002076
Root mean squared error for Model 3
## [1] 27.41181
Based on Model 3, I have made model predictions:
Here is a comparison of the predicted vs the actual catastrophe score:
I then plotted the predicted catastrophe scores alongside the actual catastrophe scores for reference.
The variables included in this dataset were shown to not be entirely helpful in predicting catastrophe scores accurately, which is exemplified in the graph above. More information would need to be considered, specifically, taking a look into tornadoes.